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Abstract 

Releasing the exact degree sequence of a graph for analysis may violate 
privacy. However, the degree sequence of a graph is an important sum- 
mary statistic that is used in many statistical models. Hence a natural 
starting point is to release a private version of the degree sequence. A 
graphical degree partition is a monotonic degree sequence such that there 
exists a simple graph realizing the sequence. Ensuring graphicalness of the 
released degree partition is a desirable property for many statistical infer- 
ence procedures. We present an algorithm to release a graphical degree 
partition of a graph under the framework of differential privacy. Unlike 
previous algorithms, our algorithm allows an analyst to perform meaning- 
ful statistical inference from the released degree partition. We focus on the 
statistical inference tasks of existence of maximum likelihood estimates, 
parameter estimation and goodness of fit testing for the random graph 
model where the degree partition is a sufficient statistic, called the beta 
model. We show the usefulness of our algorithm for performing statistical 
inference for the beta model by evaluating it empirically on simulated and 
real datasets. As the degree partition is graphical, our algorithm can also 
be used to release synthetic graphs. 

1 Introduction 

Privacy is a growing problem due to the large of amount of data being collected 
by various agencies. A lot of data is being collected in the form of graphs 
where the sensitive information includes not only individual records but also 
relationships between them. Analysis of such graph data can be very useful for 
advancement of research in many fields, but free access to such data must be 
limited due to obvious privacy concerns. One of the central goals of privacy 
research is to enable useful statistical analysis of such data while preserving 
privacy. 

One property of graphs that has been given a lot of importance in the litera- 
ture of random graph models is it's degree sequence. Although there is evidence 
that the degree sequence alone does not capture all the structural information 
in a graph, see for example, [15] . in many cases the only information available 



is that of the degrees of a graph. Every other structural property of a graph is 
estimated from a random graph model. For example, in epidemiological studies 
of sexually transmitted disease [8] , the survey collects information on the num- 
ber of sexual partners of an individual. In such natural starting point 
is to release the degree information of a graph in a private manner. 

In this paper, we study the problem of releasing a graphical degree sequence 
of a graph while preserving privacy of individual relations while allowing an 
analyst to perform standard statistical inference with the released data. Our 
algorithm satisfies the rigorous definition of privacy called differential privacy 
[3]. Our approach in releasing degree sequences can be seen in two different 
ways. In the context of interactive privacy scheme, our algorithm can be seen 
as providing a private answer to the query of degree sequence of a graph. This 
enables the analyst to fit all those models whose sufficient statistics are func- 
tions of the degree sequence. In the context of synthetic graphs, our algorithm 
can be regarded as generating synthetic graphs from the conditional (uniform) 
distribution of all graphs with a given degree sequence. 

2 Previous Work 

Considerable amount of research has been done in the area of privacy of graph 
data in the computer science community, for a partial survey of results on pri- 
vacy techniques see [T5]. Most of these techniques do not provide rigorous 
guarantees under arbitrary attacks which is provided by the notion of differ- 
ential privacy [3|. There has been some work done in the area of protecting 
graphs using the notion of differential privacy. In [TU], the authors show how 
to release the number of triangles in a graph in a private manner. In [7] , the 
authors present algorithms to release different subgraph statistics in a private 
manner. However, neither of them consider degree sequences explicitly. Also, 
none of these works evaluate the usefulness of the output of their algorithms for 
performing statistical inference. 

In [5J, the authors present an algorithm to release the degree distribution 
of the graph in a differentially private manner. They do so by asking for the 
degree partition (an ordered degree sequence) of a graph. The degree partition 
has additional consistency constraints which are used to post process the an- 
swer. The authors show that one can release a very accurate estimate of the 
degree distribution by exploiting these constraints. However, the output from 
their algorithm is not directly usable for carrying out basic statistical inference 
tasks, as we illustrate in Section [6] Specifically, the degree partition released 
by their algorithm is not suitable for model testing applications and maximum 
likelihood estimation. The main issue is that the degree partition released by 
their algorithm need not be graphical, i.e there may not exist a simple graph 
whose degree partition corresponds to the released partition. This is a desirable 
property for many statistical applications where exactly specified degrees are 
desired, such as generating or enumerating random graphs for model-testing 
applications. Such applications are very common in statistical analysis of net- 
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works, and in fact form a central core of inference procedures. If output is not 
a graphical degree sequence, standard inference procedures such as conditional 
goodness of fit tests cannot be used and the maximum likelihood estimators 
may fail to exist for the private version of the released degree partition, even in 
the cases where the original degree partition does not suffer from these issues. 
For more details, see section [5] 

We address these issues in our paper by presenting an algorithm to release 
the degree partition of a graph under differential privacy. The output from our 
algorithm can be used directly to perform maximum likelihood estimation and 
model testing of the beta model of random graphs (defined in Section [5]). We 
built upon the work of [5] and include an additional post processing step to 
ensure that the released degree sequence is graphical. This work also serves 
to illustrate the point that, simply ensuring the closeness of LI or L2 distance 
between the released and the original data may not be sufficient for statistical 
applications. However, this has been a common measure of utility in most work 
on differential privacy. Another contribution of the paper which may be of 
independent interest is describing a simple and efficient algorithm to test for 
the existence of maximum likelihood estimates of the beta model. In general, it 
is a difficult problem to characterize explicitly testable conditions in which the 
maximum likelihood estimators exist for different models. For more details on 
the problem of existence of mle, see [TH] and references therein. 

3 Preliminaries 

This section introduces the preliminaries and the notation used in the paper. 
Let G n denote an graph on n nodes and let m be the number of edges in the 
graph. A simple graph is a graph with no self loops and multiple edges. All the 
graphs considered in this paper are simple. Let Q denote the set of all simple 
graphs on n nodes. The distance between two graphs G and G' is defined as the 
number of edges on which the graphs differ and is denoted by d(G, G'). Next, 
we define the differential privacy for graph data. 

3.1 Differential Privacy 

Differential privacy for graphs is defined to protect edges in a graph (or rela- 
tionships between nodes), as the following definition illustrates: 

Definition 1 (Edge Differential Privacy) . Let e > 0. A randomized algorithm A 
is e edge differentially private if for all graphs G and G' such that d{G, G') = 1 
and for all output S , 

P(A{G) eS)< e e P(A{G') e S) 

Roughly, edge differential privacy requires that the output of the algorithm 
A two neighboring graphs should be close to each other. A basic algorithm 
to release the output of any function / under edge differential privacy is the 
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Laplace Mechanism ([3]) which adds Laplace noise proportional to the global 
sensitivity of /. 

Definition 2 (Global Sensitivity). Let f : Q — > R k . The global sensitivity of f 
is defined as 

GS(f) = max ||/(G) - /(G')||i 

Ct(Cr,Cr ) — 1 

where ||.||i is the L\ norm. 

Theorem 1 (Laplace Mechanism, [3]). Let f : Q — > M. k . Let Z\, . . . , Zk be in- 
dependent and identically distributed Laplace random variables with standard 
deviation . Then the algorithm which on input G releases f{G) + 

(Zi, . . . , Zk) is e- differentially private. 

One nice property of differential privacy is that any function of the differ- 
entially private algorithm is also differentially private as the following lemma 
illustrates. 

Lemma 1 (Post-processing, [21 HI])- Let f be an output of a differentially 
private algorithm and g be any function. Then g{f(G)) is also differentially 
private. 

In the next section, we define the degree sequence and degree partition of a 
graph. 



3.2 Degree sequence of a graph 

Let G n be an undirected simple graph on n nodes with m edges. The degree di 
of a node i is the number of nodes connected to it. 

Definition 3 (Degree Sequence, Degree partition and Degree distribution). 

The degree sequence of a graph d is defined as the sequence of degrees of each 
node. The ordered degree sequence, ordered in non- decreasing order is called the 
degree partition and is denoted by d. The degree distribution of a graph denoted 
by p is the sequence {p^, k = 1, . . . , n — 1} where Pk is the number of nodes of 
degree k. 

There can be more than one graph associated with the same degree sequence. 
Let Q (d) be the set of simple graphs on n vertices with degree sequence d. Also, 
not every integer sequence of length n is a degree sequence. Sequences that can 
be realized by a simple graph are called graphical degree sequences. Graphical 
degree sequences have been studied in depth and admit many characterizations. 
One of the characterizations that is useful for our purposes is given below. The 
set of all degree sequences of size n is denoted by DS n . The set of all degree 
partitions of size n is denoted by DP n . 

Theorem 2. [Have I- Hakimi] J2J/ and ^ Let d = {di,...d n } be a non 

decreasing sequence of integers, d G DS n iff c = {c\, . . . , c„_i} € DS„-i , 
where 

{di+i — 1 if 1 < i < di 
di + i ifdi + l<i<n — 1 
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Theorem [2] provides an algorithmic characterization of testing whether a 
given sequence is graphical. This description can also be used to create a real- 
ization of a graph with the given graphical degree sequence. The next section 
contains the main result of the paper. We present an algorithm that releases 
a differentially private graphic sequence for a given degree sequence. The algo- 
rithm also produces a graph associated with the released degree sequence. This 
graph can be randomized to produce a point from Q (d) . 

4 Algorithm to release graphical degree parti- 
tions 

A straightforward way to release the degree sequence of a graph is to use the 
Laplace mechanism. Proposition[l]calculates the global sensitivity of d, d and p. 
Using this proposition, one can release the degree sequence by adding Laplace 
noise with scale parameter b = -. By theorem 1 this algorithm is e differentially 
private. 

Proposition 1. The global sensitivity of degree sequence, degree partition of a 
graph is 2. 

It is possible to release the degree partition of a graph with smaller magni- 
tude of noise, as illustrated by [6]. The main idea is to introduce consistency 
constraints in the query q which hold for any graph G. Let the constrained query 
be q c . The differentially private answer to the query q c (G) need not satisfy the 
constraints. Hence we can post process the query q c (G) so that it satisfies the 
constraints. Note that in general this approach need not improve the accuracy 
of the estimated answer. This is because, in general, the sensitivity of q is 
different from sensitivity of q c . However, there are many naturally occurring 
consistency constraints. For example, if the query asks for a degree sequence, 
we expect that the answer be a degree sequence We can add more constraints 
to the query. For example, we can ask for the degree partition. This query has 
two constraints: the answer must be a set of monotonic nonnegative integers 
and it must also be a degree sequence. It turns out that the global sensitivity 
of these two queries are the same. Moreover, any kind of post processing does 
not violate differential privacy due to Lemma [T] 

If we let d be the query that asks for the degree partition, the constraint 
that the differentially private answer to d needs to satisfy can be written as 
the geometric constraint that d 6 DP n . If z is the output from the Laplace 
mechanism, then the post processing step is equivalent to solving the following 
optimization problem: 

s — argmin \\d — z\\i (1) 

deDP n 

We propose a two step solution to the optimization problem[l] The first step 
is to compute the nearest non-decreasing integer sequence to the output of the 
Laplace mechanism, i.e. find the LI projection of z onto the set of nondecreasing 
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integers, denoted by Z<. In the second step, we find the nearest degree partition 
to the given nondecreasing sequence of integers. The first step of the problem 
is the well known case of isotonic regression, and was also the approach used by 
[5] . We present an algorithm to solve the second step of the proposed procedure. 
Specifically, we present an algorithm to find a degree sequence d that is closest 
to a given sequence of real numbers. We then show that if the given sequence 
is ordered, then the algorithm outputs the closest (in terms of the LI distance) 
degree partition. The proposed mechanism is shown in algorithm [TJ Step 3 of 
the algorithm is the well known case of LI isotonic regression and can be solved 
efficiently, see p3] and [12] ■ In the next section, we present an algorithm to 
solve step 4. 



Algorithm 1 Input: degree partition d, privacy parameter e 
1: Sample n independent Laplace random variables ej with b = 2/e 
2: Let Zi = di + ej for i = 1, . . . , n 
3: Let c = argmin||u; — z\\i. 

n)£2< 

4: Let s = argmin \ \d — c||i 

deDP n 

5: return s 



4.1 Optimization over DS n 

In this section, we present an algorithm that finds a degree sequence closest 
to a given sequence of real numbers. We define "closeness" in terms of the 
LI distance. The motivation for using the LI distance is as follows. Let us 
assume we observe n random variables Zi, i = 1 to n such that Zi = di + ej 
where e.; ~ Lap(0,6), for i = 1 to n and d = {di} £ DS n are the unknown 
parameters. It is very easy to see that the maximum likelihood estimates of 
di in the above estimation problem corresponds to finding an degree sequence 
closest to the sequence {zi} in terms of the LI distance. In essence, we are 
reconstructing the most likely value of the degree sequence from the observed 
noisy answer. The following is the main result of this section. 

Theorem 3. Let z — {z{\ be a sequence of real numbers of length n. The degree 
sequence of graph G produced by Algorithm [^| solves the optimization problem 
argmin\\h — z\\\. 

heDS„ 

We can obtain the following corollary which allows us to solve step 4 of 
algorithm [l] 

Corollary 1. Let z — {z^ be a sequence of non increasing integers of length n. 
The degree partition of graph G output by Algorithm^ solves the optimization 
problem argmin\ \h — z\\\. 

heDP n 

In the following algorithm , let d* — argmin || ft, — z\\\. 

heDS n 
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Algorithm 2 Input: A sequence z of length n Output: A graph G on n vertices 
with degree sequence d* 

l: Let G be the empty graph on n vertices 

2: for i = 1 — > n do 

3: Let pos = \{j : Zj ^ 0,i + 1 < j < n}\ 

4: Let h = min{\z(4,\\,pos) where zu\ is the i th largest element. 
5: Let 1 = indices of h highest values of \z/\ from i + 1 to n 
6: Add edge (i, k) to G for all k G X 
7: Let Zj — Zj — 1 for all j E 1 
8: end for 
9: return G 



Remark: Given a point z, algorithm [2] finds a point in DS n that is closest 
to z in terms of LI distance. There are many differences from the traditional 
projection. Firstly, the set DS n has "holes" in it, for instance, every point whose 
11 norm is not divisible by 2 is not included in the set. Due to this reason, the 
closest point need not be on the boundary of the convex hull of DS n . Moreover, 
there can be more than one degree sequence that solve the same optimization 
problem. Specificallythe following is true. 

Lemma 2. Given any optimal solution d* to the optimization problem^ we 
can obtain another optimal solution by increasing or decreasing the degree of 
a pair of nodes by adding or deleting an edge, as long as each degree remains 
bounded pairwise by \z] . 

Using this property, we can search for an optimal degree sequence that lies 
inside the boundary of convex hull of DS n . This is an important property for 
ensuring that the maximum likelihood estimates of the beta model exist, see 
section [5] In the next section, we present the beta model of random graphs 
whose sufficient statistics are the degree sequences. 

5 Degree sequence and the beta model 

One of the simplest model involving degree sequences of a graph is the beta 
model. This model admits many different characterizations, see [T] and ref- 
erences therein. The beta model arises as a model in the discrete exponential 
family of distributions on the space of graphs when the degree sequence is a 
sufficient statistic. We can also describe this model in terms of independent 
Bernoulli random variables as follows. 

Let /? be a fixed point in R". For a random graph on n vertices, let each 
edge between nodes i and j occur independently of other edges with probability 
Pij 
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This model is called the beta model with {fa} as the vector of parameters. 
The beta model arises as a special case of pi models and a log linear model, see 
[13]. If we ignore the ordering of the nodes, then the degree partition is also a 
sufficient statistic for the beta model. In the next two subsections, we illustrate 
two common statistical inference tasks that are associated with the beta model. 
We will evaluate our algorithm by performing these tasks on the private version 
of the degree partition. 

5.1 Existence of mle of the beta model 

We would like to have the property that if the maximum likelihood estimates 
of the observed degree partition exist, then the maximum likelihood estimates 
of the private version of the degree partition also exist. Note that under strict 
implementation of differential privacy, this is not allowed, as the answer to the 
query "Does the mle exist" cannot be answered exactly. However, we relax 
this requirement, and our algorithm satisfies this property approximately. More 
specifically, if the mle of the observed degree partition exists, the algorithm 
attempts to output a degree partition whose mle also exists. This is done by 
making use of the property in Lemma [2] 

We need an efficient way to check for the existence of mle. In Q2], the 
authors provide conditions to check for the existence of the mle of the beta 
model, however their algorithm is not efficient. Here we present a simple and 
efficient algorithm to check for the existence of the mle for the degree partitions 
d which may be of independent interest. We conjecture that this result holds for 
the case of degree sequences as well. The following theorem provides conditions 
to check for the existence of mle for the degree partition which follows from a 
standard theorem of exponential families, see [9]. 

Theorem 4. Let d be a degree partition. The mle of the beta model exists iff 
d £ ri(conv(DP n )) where conv(DP n ) is the convex hull of the set of degree 
partitions, which is true iff 

1. di > and d^ < n — 1 V i . 

2- Eti d i ~ Etn-i+i x i < k (n - 1 - for 1 < k + 1 < n 

Theorem[4]shows that the mle of the beta model exists iff the degree partition 
lies in the relative interior of convex hull of DP n . 

5.2 Conditional tests and conditionally specified models 

Conditional goodness of fit tests are used to evaluate the fit of any model and 
are based on the space Q{d). To perform conditional tests, we need the released 
degree partition to be graphical. This is because if d is not graphical, then 
G(d) is empty. As another example, consider conditionally specified models of 
random graphs. In these models, one considers the degree sequence is treated 
as a nuisance parameter and conditions on them. Statistical inference can be 
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performed by simulating from the space of all graphs given the fixed degree 
sequence Q{d). But Q{d) is empty if d is not graphical. However, if d is inside 
the convex hull of degree sequences, then one can perform tests based on the 
set of graphs given the expected degree sequence. However, if d is outside this 
convex hull, then G(d) is empty or if d is on the boundary of the convex hull, 
Q(d) contains a single element. In all these cases, our algorithm outputs a degree 
graphical degree sequence closest to d. 

6 Experiments 

In this section, we evaluate our proposed algorithm (called isotone-hh) for re- 
leasing degree partitions (algorithm [I]) empirically and compare it with the 
algorithm due to [6] (called isotone). In the original algorithms, the authors 
use L2 minimization, but we use an LI minimization to be consistent with our 
algorithm. 

The main goal of these experiments is to evaluate the statistical properties of 
the degree partitions produced by these differentially private algorithms. There 
are three categories of experiments. In the first setting, wc compare how close 
the released degree partition is to the original degree partition. In the second 
set of experiments, we are interested in the following basic question: If the mle 
exists for the original degree partition, does the mle also exist for the private 
version. In the last set of experiments, we evaluate the closeness of the distri- 
bution of number of triangles in the space of graphs given the original degree 
sequence is to the space of graphs given the private degree sequence. This dis- 
tribution is important for goodness of fit tests for the beta model. Specifically, 
this distribution is used to compute the p-values for goodness of fit tests. We 
present our results for the karate dataset (\T7\) obtained from the UCI network 
repository. This dataset is a social network of friendships between 34 members 
of a karate club at a US university. For the experiment related to the existence 
of mle, we also present our results for the family of power law graphs. 

Remark: In our experiments, we only ask for the degree partition. An 
analyst may be interested in releasing the degree sequence when the order is 
set by some other requirement. In such cases, our algorithm can release a 
graphical degree sequence but the additional constraints of monotonicity no 
longer exist. In simulation experiments, we found that the degree sequence 
released without these additional constraints is still very noisy and not useful 
for statistical inference. This could be due to issues with the algorithm, but 
it could also be that differential privacy requires addition of a large amount of 
noise. Thus in cases where the ordering information is not useful, it is better to 
ask for the degree partition. 

6.1 Existence of MLE of the beta model 

As noted in section [5] the maximum likelihood estimates of the beta model 
exist only when the degree sequence lies in the interior of the polytope of degree 
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sequences. In this set of experiments, we simulate random graphs with degree 
sequences following the power law p$ = P(di = x) oc ex 1 for different values of 
7 and different node sizes. For each simulated degree partition (d), we find the 
degree partition released by the isotone algorithm and the isotone-hh algorithm 
(d e ). We compute the probability that the existence of mle for the original 
degree partition coincides with the existence of mle exist for the released degree 
partition by simulating over the randomness of the Laplace noise and the random 
graph model. We used the conditions provided in theorem [3] to check for the 
existence of mle for the degree partition. The results are shown in Table [l] 

Table 1: P (existence of mle of d e coincides with d ) for power law family of 
graphs 

Isotone-hh Isotone 



n 


1 


1.5 


2 


1 


1.5 


2 


100 


0.983 


0.997 


0.910 


0.242 


0.240 


0.251 


200 


0.998 


1.000 


0.930 


0.240 


0.239 


0.241 


400 


1.000 


1.000 


0.956 


0.241 


0.240 


0.233 


500 


1.000 


1.000 


0.967 


0.243 


0.243 


0.232 



From Table [TJ we can see that for the isotone algorithm, the existence of mle 
coincides only 25 percent of the times. On the other hand, the existence of mle 
coincides at least 90 percent of the times for the Isotone HH algorithm. Table [2] 
shows the results for the Karate dataset. Again we can see that the mle exists 
with high probability for the isotone-hh algorithm whereas the mle exists only 
50 percent of the times for the Isotone algorithm. The mean L2 error for both 
the algorithms are very close to each other. 

Table 2: P(mle exists) and L2 error for Karate Dataset 





P(mlc exist) 


Mean L2 error 


Isotone HH 


0.998 


52.63 


Isotone 


0.499 


56.57 



6.2 Parameter estimates of the beta model 

In the next set of experiments, we evaluate how close the maximum likelihood 
estimates of the beta model for the synthetic graph are to the original graph. 
The comparison is tricky because in more than 50 percent of the cases, the mle 
did not exist for the isotone algorithm. In such a case, we assumed that the 
parameter estimates are 0. For the degree partition of the karate dataset, we 
released the degree partition using the isotone and the isotone-hh algorithm 500 
times. Figure [l] shows the results of the experiment; it is a plot of the estimates 
of the j3 parameters on the y axis vs the node id on the x axis. The red, green, 
blue lines indicate the mean value of the parameter estimates, the maximum 
likelihood estimates and the 95 percent confidence intervals of the estimates 
respectively. We can see that the estimates for the Isotone algorithm are biased 
and have higher variance when compared to the isotoneHH algorithm. This 
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Figure 1: Evaluation of algorithms to release degree sequence on the Karate 
Dataset 



is because of the fact that the mle does not exist for many degree partitions 
released by the Isotone algorithm. 

6.3 Empirical Null of number of triangles 

In the last set of experiments, we compare G{d), the space of graphs given 
the original degree sequence with G(d e ), the space of graphs given the released 
degree sequence. If the released degree sequence is not graphical or is an extreme 
point, this set is either empty or has a single element. This set is associated 
with model testing applications for the beta model. For example, one common 
procedure for testing the fit of the beta model is to pick a statistic T(G), say 
number of triangle and compute the sampling distribution of the number of 
triangles of random graphs with the fixed degree sequence. This is in line 
with the exact tests in the contingency table literature where one conditions 
on the sufficient statistics. As before, we use the Karate dataset and repeat 
the experiment 500 times. For each run, we release the degree partition using 
isotone and isotonehh algorithms and compute the empirical null distribution 
of the number of triangles. Figure [2] shows the results for 10 sample runs. 

For the isotone algorithm, if the released degree sequence was not graphic, 
we output a point mass distribution at arbitrary point, in this case -10. The 
blue, green and red lines in the figure show the distribution of the number of 
triangles obtained from the Isotone algorithm, the original degree sequence and 
the IsotoneHH algorithm respectively. Each panel shows the output from one 
random draw. We can see that in many cases, the Isotone algorithm fails to 
produce a valid distribution. On the other hand, the IsotoneHH algorithm pro- 
duces a valid distribution which is also close to the true empirical null. However, 
there are cases when the empirical null is completely disjoint, for example, the 
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Figure 2: Distribution of the number of triangles in the Karate dataset 



second figure in the bottom panel. 



7 Conclusion and Future Work 

In this paper, we presented an algorithm to release a graphical degree sequence 
of a graph in a differentially private manner by adding an additional step to 
the algorithm proposed by [6]. The main motivation for releasing a graphical 
degree sequence is to enable analysts to perform useful statistical inference, for 
example, goodness of fit tests, and maximum likelihood estimation of the beta 
model. We presented simpler conditions for testing the existence of mle for 
the beta model for a degree partition and used these conditions to empirically 
evaluate our algorithm and that of [6] . We found that even though the mle exists 
for the original degree sequence, the mle fails to exist in more than 50 percent 
of the cases for the sequences released by [5] . On the other hand, our proposed 
algorithm performs better, in that the mle exists with very high probability. 
We also compared the effect of other statistical inference procedures such as 
parameter estimation and goodness of fit testing. Both of these are inherently 
tied to the nongraphical nature of the released degree sequence. 

However, there are further issues that need to be addressed. For instance, 
to compute p-values, we need to know the observed number of triangles. Under 
differential privacy, the analyst obtains a private version of observed number 
of triangles, possibly released by using the algorithms provided in [7]. Thus, 
not only is the space of graphs giving rise to the empirical null distribution 
completely disjoint from the original space of graphs, but also the observed 
statistic is a noisy version of the original statistic. More work in needed to 
understand the behavior of p-values under such setting. 
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Another direction of work would be to release degree sequences for bipartite 
and directed graphs. The degree sequence of bipartite graphs form sufficient 
statistics for the so called rasch models, see for instance [13]. 
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Proof of theorem [I] 

In this section , we present the proof of correctness of algorithm [2] The main 
idea behind the proof is that degree sequences can be written as special sums, 
which are helpful in selecting directions for building the solution to the LI 
optimization problem. We begin by some definitions. 

Definition 4 (k-star degree sequence). A degree sequence dk is said to be a 
k-star degree sequence if there exists a graph G G G(dk) on n vertices such that 
G is a k-star. Note that we allow the graph to have disconnected nodes, specially 
in case k < n. Let K„ be the set of all k-star degree sequences of length n. 

Lemma 3. Every degree sequence d can be written as d = gi where gi € K„. 

Proof. Let d be any degree sequence. Consider repeated applications of Theorem 
[2] to d. Let the residue sequence obtained at each step be rj. It is easy to see 
that this procedure terminates after atmost n steps, thus it generates at most 
n residue sequences. Moreover, rj+i is obtained from r*j by reducing rj with 
a maxj- 7"i(j) star sequence. Let g$ be the star sequence used to reduce r, to 
r i+ i. Since d is a degree sequence, the last residue sequence will be the degree 
sequence. It is easy to see that d can be reconstructed as a sum of gi, i.e. 
d = J2iSi- Since , each gi is a k star sequence, gi e K„. □ 

Definition 5 (Havel Hakimi Decomposition). The Havel Hakimi decomposition 
of a degree sequence d is defined as the set of k-star degree sequences obtained 
after the application of Theroem^ and is denoted by {d}. 

Lemma [3] shows that every degree sequence can be written as a sum of 
k—star sequences, thus every degree sequence has a havel hakimi decomposition. 
Further, it is easy to see that the decomposition is unique if the order of the 
nodes is fixed. The next two lemmas allow us to restrict the search for optimal 
degree sequences in the set of degree sequences that are pointwise bounded by 
z after eliminating the negative coordinates of z. 
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Lemma 4. Let (zi, . . . , z n ) be a sequence of real numbers. Let X = {i : 
Zi > 0}. Let f z (a) = J2i\ z i~ a i\- Let d be any degree sequence such that 
argmin a€DSn f z (a) = d and dil c ) > 0. Then there exists a degree sequence d* 
such that d*\l c ) = and f(d) = f(d*). 

Proof. If di — OVi € Z c , the lemma is true by letting d* — d. Hence assume 
that 3 at least one i = j e X c such that dj > 0. Let d* be the degree sequence 
obtained from d by reducing it with a dj - star. 
Next let us show that f(d*) < f(d). 





~d*\ 




Ei* 


-d*\+Y,\zi-d*\ + 


\zj-d*\ 


i=j 






Ei* 


-di + l\ + J2\ z i-d, 


l + M 








Ei* 


-rf 4 i + E 1+ Ei* 




iej 






Ei* 


- d i\ + E \ Z i _ d i\ + 




iej 






Ei* 


- d i\ + E 1* ~ rf *l + 


\dj~Zj\ 


iej 






f(d) 







But c? is such that argmin ae£)5n / z (a) = d, hence f(d* ) = /(d). If there is more 
than one j e I that > 0, we can redefine d* iteratively until there are no 
such j left. 

□ 

Lemma 5. Let (zi,...,z n ) be a sequence of non negative real numbers. Let 
fz( a ) — \ z i ~ a i\- Let d be any degree sequence such that argmin aeDS f z (a) = 
d. Then there exists a degree sequence d* such that d* < \zi]\fi and f z (d*) = 

fM- 

Proof. If di < \zi]Vi, the lemma is true by letting d* = d. Hence assume that 
3 at least one i — j such that dj > \zj~\. Let d* be defined as follows: 

!\zi] for i = j 
di-1 for i e X, j 1 
di for i e J 

where T and J are any index sets such that |X| = dj — \zi\ , and ZU l /U{i} = [n]. 
Clearly, d* is a degree sequence because it is obtained by reducing d with a fc-star 
sequence, where k = dj — \z{\ . 

Next let us show that f(d*) < /(d). 
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f(d*) = V \z, - d*\ 






y j \ *> i i 


+ izi - d* 


1 + k- - d*| 
1 i j j i 




it J 






1 1 1 I \ A 1 v 


- di\ + \Zj ~ \Zj\\ 








- E ' Zl - 




Zi - ck\ + \zj - \zf\ 








= E l z » ~ ^1 


+ E i z ' ~ ^ 


+ \zj- \zj]\ + dj - 








= E l Zl - ^1 


+ E ~ 


+ d 3 - Z ] 








- E l Zl ~ °^ 


+ E \ Zi ~ ^' 


+ \ d 3 ~ Z j\ 


iex 






= f(d) 







But d is such that argmin aeDSj / z (a) = d, hence f(d*) = f{d). If there is 
more than one j such that dj > \zf\ , we can redefine d* iteratively until there 
are no such j left. □ 

Lemma [5] shows that optimization of the LI distance between d and z over 
DS n can be performed by considering degree sequences d that are point-wise 
bounded by \z \ and that we can ignore the negative entries of z. Thus, from this 
point onwards, we will consider only those degree sequences that are bounded 
by \z] an assume that z has positive entires only. Let A be a set of degree 
sequences, we will denote by A< z the set of degree sequences in A point-wise 
bounded by \z~\ . 

The next lemma is the key result that shows that we can always improve the 
LI distance by replacing the fc-star sequences in the Havel Hakimi decomposition 
of any degree sequence by an appropriate fc-star sequence. 

Lemma 6. Let do be any degree sequence in DS< Z and let {do} — {gi} be its 
havel hakimi decomposition. Let {x k } be the k-star following sequence: x\ = 

k 

argmin{f z ig) : g G K< z }, x k+1 = argmin{f z iJ2 x l + g), 

i=i 

Let dk be defined as the following sequence: if x% G {do} then d\ = do, else d\ — 

k n 

x i + J2 9i- Similarly, if x k G {d }, then d k = dk-i, else d k — J2 x i + J2 9i 

i^j i—1 i—k+1 

where gi G {d}. Then, f z (d n ) < f z (do) and each d k G DS< Z . 
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Proof. Consider 

k n fc+1 n 

fz(dk)~ fz(dk+i) = \\z-^2xi- 9i\\-\\z-^2x l - Y 9i\\ 

i=l i=k+l i=l i=k+2 

k k 

= Xk+i ~ 9k+i = \\z-y^Xj - gk+i\ \ -\\z-y^Xj 

i=l i=l 
k k 

= fz(^2%i +9k+i) - fzC^2%i +%k+i) 

i=l i—1 

> o 

Adding these inequalities for k = to k = n — 1, we get f z (do) — f(d n ) > 0, as 
required. Moreover, each is clearly is a degree sequence, as dk is obtained from 
dk+i by replacing a fc-star sequence from its Havel Hakimi decomposition. □ 

The next proposition shows how to find the best fc-star sequence for the LI 
optimization. 

Proposition 2. Given a non negative sequence z, the element in the set K< z 
that solves the following optimization problem 

min \\z — olli 

is the following k-star sequence: if i* = {i : [zj*] = max, then k — 

\zi*~\ . Let I be the index set of k largest elements of z excluding i* , then there 
is an edge between i* and i for all i el. 

We are now ready to present the proof of Theorem [TJ 
7.1 Proof of Theorem Q] 

Let d* be the optimal degree sequence. Let X = {zi : Zi < 0}. By lemma |4j we 
can set d*(l) — 0. Thus, it is enough to find the optimal degree sequence d* 
with respect to the function f z {i"){d). From this point onwards, let us assume 
that X = 0. This is achieved by Step 3 of algorithm[2j Moreover, from lemma[5j 
it is enough to consider degree sequences bounded pointwise by \z~\ . Thus, we 
need to find the optimum over the set DS< Z . By lemma[6j we can construct the 
optimal degree sequence over DS< Z by starting with any degree sequence do and 
replacing it by the fc — star sequence defined in lemma[6] Since is also a degree 
sequence, let the starting sequence do be the zero degree sequence. Then, the 

k 

optimal degree sequence is J2k=i x k where Xk+i = argmin{/ z (^ Xi + g),g £ 

i=i 

K.< 2 \ {^i}f =1 } . It is easy to see that the optimal fc-star sequence for each Xk+i 
obtained by Proposition [2] is the same sequence selected by steps 5 and 6 of 
algorithm [2] 
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Theorem 5. Given a graph G, we can release its degree sequence differentially 
privately in 0{m + nlogn) time. We can also release a synthetic graph corre- 
sponding to the private degree sequence in 0(m + nlogn) time. 
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